SPACE 2

home *** CD-ROM | disk | FTP | other *** search

/ SPACE 2 / SPACE - Library 2 - Volume 1.iso / utility / 533 / kwic / read_me.txt < prev next >

Wrap

Text File | 1991-07-02 | 11KB | 224 lines

Documentation for KWIC.PRG Key Words in Context listings have been in use for several years by technical libraries. Essentially, They provide an abstracting service where no provider of such a service exists. Key words are extracted from the title of a book or journal article, sorted, and the entire title is printed in its' original context; that is, the full title of the work. A key word is simply any word that is not on a list of words to be excluded. Excluded words include the articles and most prepositions and conjunctions. Thus, a person doing research on migraine headaches looks under 'migraine', 'headache', any other synonyms he knows and under the drugs commonly prescribed for this terrible malady. Some of the titles will appear more than once during his search, of course, but once he finishes he knows that he has exhausted the information content of the set of titles. The program can be used directly from the monitor instead of a hard copy, but the listing offers portability. Also I find the presence of the computer a distraction, one gets fascinated with the research tool instead of the problem being researched. (How many ex chemists, biologists, astronomers, ...., are now computer programmers?) Getting started If you have a reasonably standard printer that is powered up and ready to print, you can demonstrate the program simply by executing it and giving it the default response (a carriage return) every time a question is asked. Sample files are included to be used as a base for a built in demonstration. How I use this program I collect cookbooks. I have no data base program to assist me. There are several sub categories to the collection. That is paper backs, unusually prolific authors, books that are encyclopedic in nature, books too big for ordinary shelves, and so on. The sample file included here is one of the sub collections. The main file of rather ordinary hard bound books is too large to include in an upload. The main file starts with the book title in column 1 and has the authors' name starting in column 40. (Note that the sample file included here is a _sub_ collection and does not have the author's name field.) The file is sorted by author name and the books are shelved in that order too. Sorting is done with a public domain sort program. Each book is represented by a one line entry. This is necessary because the sort program I use demands a _consistent_ record length. That is, a record is zero or more characters followed by a CR and LF. To make a KWIC listing, I specify the particular file of interest when 'input file' is called for. I often make composite listings including several sub-categories. To make a composite file, I find the Public Domain program PCOMMA (aka PCOMMAND), which emulates PC-DOS invaluable. I have several .BATch files which, when run, join up the individual files in various ways and produce the desired composite file. When the file of 'bad words' is called for I use a personalized file. Some words are used so often _within_ a specialty that they become simply 'noise words'. In cookbooks, such words as 'cook', book' 'cookbook', and 'recipe' come up so often as to be meaningless. When the program asks for columns to be ignored, I specify column 39. This means that the authors' name is not a keyword. After all, I already have a listing sorted by authors' name. I also sometimes have notes beyond column 39 and I want them ignored too, as far as key words are concerned. When KWIC asks for the leftmost column for the keyword, I specify column 60. I then specify an Epson printer, with printing to be 137 columns wide. So I end up with a hard copy with key words nicely aligned on column 60 and the authors' name is on the same line (but not aligned properly, unfortunately) so I can find the book physically without referring to another index. That's how I use it. Now on to the general nature of the beast. The Input file The input file is prepared with your favorite text editor or a word processor in ASCII mode. It is a list of book titles, journal articles, or any analogous item. An entry normally starts in column 1 and can be as long as desired, within reason. The program will work best however, with relatively short entries, say 80 characters or less. Normal practice will result in most key words having the initial letter in upper case. The program will find them regardless of upper/lower problems. But after they are found they are sorted following the collating order of ASCII. That means that 'a' follows 'Z'. Numbers will be found as key words too. Sorting puts all digits ahead of all letters. Blank lines in the input file will be ignored. The bad word file You can make your own personalized bad word file by modifying the file included with the .ARC. It is a simple text file, too. The word must start in column 1 and be followed _immediately_ by [Return]. That is, 'apple' is not the same as 'apple '. These words should all be lower case. You can enter the words in any order that occurs to you; the program will automatically do a simple resort of the bad word file every time it runs. You can have as many bad word files as you wish. The file included is specialized for cookbooks. About the first 80 entries would apply to any English title, simply remove the specialized words and replace them with your own set. Program Output After the program has run and extracted all the key words and sorted them, it is ready to provide output. Since the program may run for several minutes, it allows you to get several outputs from a single run of the program. The monitor choice is mostly offered as a preview to get an idea of whether things turned out OK. Since it is limited to 80 columns width, it is not very effective for long records. The basic output will often be an Epson printer with the 137 column line choice. This permits you to align the key words at, say column 60 and get a nice looking output with reasonably sized titles. Non-Epson printers If you have a non Epson printer, there are two alternatives. The first alternative is to set up the printer to produce some kind of compressed printing _before_ you run KWIC. KWIC will not send anything except data to the printer if you specify non Epson. You can also use this approach if you want more than 137 columns on an Epson printer, the printer can easily go to 160 columns and can even be pushed to exceed that. The other alternative is to specify output to a file. This will be an ordinary ASCII file which you can read into a text editor, perhaps do further editing, and output the same way you would any other text file. One word about writing to a file. The program uses the default Personal Pascal text file write and it performs an incredible amount of slow activity on the disk. If you have a nervous temperament, as I do, and you see hundreds of writes to your disk, you may get very tense. The program works fine, but if this bothers you, write to a blank floppy disk (making things even slower!) and then copy that file to your hard disk. I could have speeded this write up, but considering the nature of the program, it just didn't seem worth the effort. Note also, that the file produced can easily be quite large, one that I commonly produce is in excess of 200,000 bytes. Loose ends The program allows input files to be up to 110,000 bytes long and to have up to 8,000 key words. Normal printing would produce up to a 140 page listing. One of these sizes may be too small for your situation, or the ratio (the number of key words per title) may be wrong for you. These numbers were chosen to allow the program to run in a system that has about 250K bytes of free RAM. If you want a customized version, send me E-Mail on GEnie and I can probably make a special version for you to fit your needs. The program doesn't care what file names or file name extensions you use; the names provided on the file selectors are merely suggestions. For those interested in Pascal, note that the sort program included can be used as a debugged Quicksort. To customize it, simply change the type declarations and the SWAP procedure. The base routine is fast, it is so slow as used here because it uses Pascal string logic to copare two 11 character strings. This could easily be speeded up, but considering the nature of this program, it didn't seem worthwhile. The procedure that reads a file of arbitrary length into an ARRAY of characters might also be useful, it seems that so many programs start out (or should start out) by doing just that. This program may be freely copied, uploaded, and propogated by any suitable means as long as the content passed on includes _all_ the files contained in the original ARChive. Additionally, the name should remain KWIC.ARC unless the target system already has that name in use. Placed in public domain July 1991. Merlin Hanson GEnie address: M.L.HANSON